(19) 



J 



(12) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets (11) EP 0 817 099 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 


(51) int. CI 6 : G06F 17/30 


07.01.1998 Bulletin 1998/02 




(21) Application number: 97110249.6 




(22) Date of filing: 23.06.1997 




(84) Designated Contracting States: 


(72) Inventor: Nielsen, Jakob 


AT BE CH DE DK ES Fl FR GB GR IE IT LI LU MC 


Atherton, California 94027 (US) 


NLPTSE 


(74) Representative: 




(30) Priority: 24.06.1996 US 668877 


Pellmann, Hans-Bernd, Dipl.-lng. et al 


Patentanwaltsburo 


(71) Applicant: 


Tiedtke-Buhling-Kinne & Partner 


SUN MICROSYSTEMS, INC. 


Bavariaring 4 


Mountain View, California 94043-1100 (US) 


80336 Munchen (DE) 



(54) Client-side, Server-side and collaborative spell check of URL's 

(57) Spell checking of network addresses such as 
Uniform Resource Locator (URL) addresses is provided « 
at three levels. Each is invoked when a connection to 
the specified network address is unable to be estab- 
lished. At a client level, the specified URL is compared 
with URL's previously successfully used to find candi- 
date misspellings. At a server level, directory and file 
names are checked against corresponding components 
of the URL to which connection was requested to return 
a list of candidate correct spellings to the requestor 
Excluded from the list returned to the requestor are the 
correct spellings of "hidden" files to which general 
access is not desired. At a network access provider 
level, information about URL's successfully used by all 
customers is accumulated and used to provide a candi- 
date list of correct spellings to a user. Older entries are 
periodically pruned from the database to control size. 
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Description 

BACKGROUND OF THE INVENTION 

Field of the Invention 

The invention relates to computer communications 
systems and more particularly to spell checking' of 
resource identifications in a network environment. 

Description of Related Art 

In order to access specific World-Wide-Web 
(WWW) pages, users must often enter the Uniform . 
Resource Locator (URL) which provides the address of 
the page on a remote server. However, as WWW brows- 
ers evolved, the focus of the user interface has been to 
allow users to access remote pages by selecting hyper- 
text links, thus often removing the need to manually 
enter URLs. Scant attention has been paid to the prob- 
lems inherent in manual URL entry. Yet. the explosive 
growth of the WWW has made it inconvenient to follow 
a long series of hypertext links to retrieve a page 
desired by the user: in fact companies, organizations 
and individuals often provide their URLs in television 
advertisements, on printed materials, and verbally This 
has led to a growing number of instances when the user: 
would prefer to directly enter the URL in the browser. 

A major problem with the manual entry of URLs is 
the introduction of spelling errors, which are particularly 
common because of the characteristics of URL syntax 
and structure. Often long, the URL often includes terms, 
such as "http", "com", "org", "gif ", "jpeg", that are not 
commonly known by users. URLs may also be in a for- 
eign language, especially for those users in non-English 
speaking countries. Additionally, the URL may include 
odd special characters such as'-, \ and @ that are dif- 
ficult to type and hard to remember. The fact the URLs 
interpret upper and lower case letters differently is yet 
another source of user input error. Finally, the user is 
often relying on a quickly made note or just his memory 
from a brief appearance of a URL or from a spoken URL 
in an advertisement. AN of these factors taken together 
provide a rich basis for the introduction of spelling errors 
during manual entry of URLs. 

In order to assist the user with manual U RL entry a 
. spelling checker is needed. Spell checking in general is 
well established in the art, with numerous different 
implementational schemes. The central idea of a spell- 
ing checker is to take the word in question and compare 
it to a dictionary of legal spellings to find one or more 
words that are spelled roughly the same way and to 
then provide the user the ability to chose the correct 
word from a list presented by the spelling checking pro- 
gram. 

However, traditional spelling checkers, using the 
prior art, are unsuitable for use in the WWW environ- 
ment for several reasons. The dynamic nature of the 
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WWW, where new URLs are constantly being created, 
precludes the use of a static dictionary. The sheer 
number of URLs precludes the use of a dynamic diction- 
ary: as of -April 1996 there were more than 30 million 

s URLs on the WWW. Additionally, since the WWW oper- 
ates in a client-server environment, only the server 
knows what URLs are valid for accessing WWW pages 
residing on that server. Servers often contain files 
(pages) that are not intended for general use and the 

1C server administrators rely on the fact that only users 
who know the exact URLs can retrieve those files. The 
introduction of sophisticated spelling checkers for URLs 
must take this fact into account. Finally, the prior art pro- 
vides no mechanism for utilizing knowledge obtained 

75 from other users' behavior. 

As an example of the prior art, Netscape's Naviga- 
tor WWW browser performs a simplistic spelling check 
on manually entered URLs. Specifically, the program 
tries to identify and correct problems with the protocol 

20 and server names. The program will try adding "http://" 
to the URL if no protocol is specified, it will also add 
"www." before and ".com" after the domain name if 
they are not present in the manually entered URL. 
These spelling check capabilities are simple but helpful, 

25 but are not sufficiently robust or extensive to solve the 
: general problem of spelling errors in manually entered 
URLs. 

SUMMARY OF THE INVENTION 

30 

The present system provides apparatus, systems, 
processes and software which provide a user who man- 
ually enters a U RL with a sophisticated method for spell 
checking the URL to increase the probability of finding 

35 the desired WWW in a timely fashion. 

The invention is composed of three components 
that may work in concert, individually or in pairs. The cli- 
ent-side component operates in conjunction with the 
user's browser running on the user's computing device. 

40 The server-side component operates on the server 
computing device which contains the WWW pages the 
user wishes to acquire and communicates with the user 
by dynamically constructing one or more WWW pages 
containing alternative spellings for the URL (as hyper- 

45 text) and sending the constructed page(s) to the user's 
browser for display. The collaborative component oper- 
ates on Internet Service Providers (ISPs) servers, or on 
an organization s proxy server to maintain the protection 
of whatever firewalls are in place. It communicates to 

so the user in a manner similar to that of the server-side 
component. The collaborative component of the inven- 
tion utilizes knowledge from other users' behavior (i.e. 
the WWW pages they have successfully retrieved in the 
past by all users) to provide a knowledge base for the 

55 spelling checker. 

The three components (client-side, server-side and 
collaborative) represent three unique but complemen- 
tary methods of providing spelling check services to the 
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user. Each component resides on a different part of the 
WWW and addresses the spelling check problem differ- 
ently. The multi-platform and dynamic nature of the 
WWW suggests that a user cannot be assured that all 
three components are always available, but the inven- 5 
tion is robust enough to utilize only those components 
that are present. In fact, the components are not cou- 
pled at all but yet are able to work together by using the . 
common language of the WWW, namely HyperText 
Markup Language (HTLM). which provides a universal w 
way of encoding information that any compliant browser 
can display to the user. 

The novel features of this invention are inter alia its 
sophisticated spelling check functionality, its dynamic 
nature, and its ability to leverage the experiences of 15 
multiple users to a knowledge base that will assist ali 
future users. 

• The invention relates to apparatus for checking 
spelling of network addresses, including a database 
containing valid network protocol names, a database 
containing valid network server names, a database con- 
taining valid component names, and a computer. config- 
ured to analyze a network address, used in an attempt 
to establish a connection to that address but which did 
not result in a connection, to compare portions of that 
address with a database containing corresponding 
information and present to a user one or more alterna- 
tive spellings of that address if a portion of that address 
does not match identically a valid entry in the database. 

The invention also relates to apparatus for checking 
spelling of network addresses received at a server hav- 
ing a hierarchical directory structure from a remote user, 
including a database containing names of hidden files, 
and a computer configured to analyze network 
addresses term by term beyond the server address, to 
compare portions of an address with corresponding 
portions of the server directory and to present to the 
remote user one or more alternative spellings if a direc- 
tory or file name does not match identically a valid entry 
in the hierarchical directory, unless such an alternative 
spelling is contained in the database. 

The invention also relates to apparatus for checking 
spelling of network addresses received from a remote 
user at a network access provider, including a database 
containing remote server names to which users have 
successfully connected, a database containing network 
addresses, and a computer configured to analyze a net- 
work addresses, received from a remote user which did 
not result in a connection, to compare portions of that 
address with portions of each database containing cor- 
responding information and present to a user one or 
more alternative spellings if a portion of a network 
address does not identically match a valid entry in the 
databases. 

In each of the apparatus described, the one or more 
alternative spellings are presented in a form, such as 
HTML, so that the remote user can select one of the 
alternative spellings with an input device, such as a 



mouse, and attempt to connect again using the selected 
alternative spelling. 

The invention is also directed to a system for check- 
ing spelling of network addresses received from a user, 
including at least any two of a client spell checker, a net- 
work access provider spell checker and a server spell 
checker, resident on respective computers connected to 
the network. 

The invention is also directed to a system for check- 
ing spelling of network addresses received from a user, 
including a network, and a computer connected to the 
network configured to spell check network addresses 
and to suggest alternative spellings. The computer can 
be either a computer is operated as a client in a client- 
server mode, one operated as a server in a client-server 
mode or one operated as a network access provider. 

The invention is also directed to a method of check- 
ing spelling of network addresses, by comparing a por- 
tion of a network address received from a user which 
did not result. in a connection with entries in a database 
containing - corresponding portions of network 
addresses which had previously resulted in connec- 
tions, identifying candidate matches from the database 
which match imperfectly a portion of a network address, 
and when one or more candidate matches is found, pro- 
viding a list of the candidate matches to the user> Can- 
didate matches are provided to a user in a hypertext 
format, . . 

The invention is also directed to a method of check- 
ing spelling of network addresses in a server having a 
hierarchical directory, by comparing a portion of a net- 
work address received from a remote user which did not 
result in access to a document on the server with corre- 
sponding portions of the hierarchical directory, and pre- 
senting to the remote user alternative spellings if a 
directory or file name does not match identically a valid 
entry in the hierarchical directory Hidden files are 
excluded from the list of alternative spellings presented 
to a user. 

The invention is also directed to a method of check- 
ing spelling of network addresses at a network access 
provider, by storing remote server names and network 
addresses, to which network access provider users 
.have successfully connected, in one or more data- 
bases, comparing portions of an address received from 
a network access provider user which did not result in a 
connection, with corresponding portions of the data- 
base, and presenting to the network access provider 
user alternative spellings if a portion of an address does 
not identically match a valid entry in the database. 

The invention is also directed to computer program 
products carrying out the techniques of the invention. 

Still other objects and advantages of the present 
invention will become readily-apparent to those skilled in 
the art from the following detailed description, wherein 
only the preferred embodiment of .the invention is shown 
and described, simply by way of illustration of the best 
mode contemplated of carrying out the invention. As will 
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be realized, the invention is capable of other and differ- 
ent embodiments, and its several details are capable of 
modifications in various obvious respects, all without 
departing from the invention. Accordingly, the drawing 
and description are to be regarded as illustrative in 
nature, and not as restrictive. 

BRIEF DESCRIPTION OF DRAWINGS 

The object, features, and advantages of the system 
of the present invention will be apparent from the follow- 
ing description, in which 

Figures 1A and 1B illustrates environments in 
' which the invention operates. 
Figures 2 and 3 together are a flowchart for the cli- 
ent-side process in accordance with the invention. 
Figure 4 is a flowchart for a client-side spelling 
checker in accordance with the invention. 
Figure 5 is a flowchart of a Phase I routine used 
with the client-side spelling checker of Figure 4. 
Figure 6 provides a detailed flowchart for Phase II 
of the client-side spelling checker of Figure 4. 
Figure 7 provides a detailed flowchart for Phase lll 
of the client-side spelling checker of Figure 4. 
Figures 8A, 8B and 8C illustrate content of data- 
bases used by the client-side component of the 
invention. 

Figure 9 is a flowchart of a process for updating the 
client-side databases. 

Figures 10 and 11 together are a flowchart for the 
server-side spelling checker component of the 
invention. 

Figure 12 is a flowchart for the use of the collabo- 
rative spell checker component of the invention. 
Figure 13 is a flowchart of a process for dynamic 
and non-dynamic database pruning in the collabo- 
rative component of the invention. 
Figures 14A and 14B illustrates exemplary data- 
bases used with the collaborative component of the 
invention. 

Figure 15 is a flowchart of a process for updating 
the collaborative component's databases. 
Figure 16A illustrates a computer of a type suitable 
for carrying out the invention. 
Figure 16B illustrates a block diagram of the com- 
puter of Figure 16A. 

Figure 16C illustrates an exemplary memory 
medium containing one or more programs usable 
with the computer of Figure 16 A. 

NOTATIONS AND NOMENCLATURE 

The detailed descriptions which follow may be pre- 
sented in terms of program procedures executed on a 
computer or network of computers. These procedural 
descriptions and representations are the means used 
by those skilled in the art to most effectively convey the 



substance of their work to others skilled in the art. 

A procedure is here, and generally, conceived to be 
a self-consistent sequence of steps leading to a desired 
result. These steps are those requiring physical manip- 

5 ulations of physical quantities. Usually though hot nec- 
essarily, these quantities take the form of electrical or 
magnetic signals capable of being stored, transferred, 
combined, compared, and otherwise manipulated. It 
proves convenient at times, principally for reasons of 

10 common usage, to refer to these signals as bits, values, 
elements, symbols, characters, terms, numbers, or the 
like. It should be noted, however, that all of these and 
similar terms are to be associated with the appropriate 
physical quantities and are merely convenient labels 

i5 applied to these quantities. 

Further, the manipulations performed are often 
referred to in terms, such as adding or comparing, 
which are commonly associated with mental operations 
performed by a human operator. No such capability of a 

20 human operator is necessary, or desirable in most 
cases, in any of the operations described herein which 
form part of the present invention; the operations are 
machine operations. Useful machines for performing 
the operation of the present invention include general 

25 > purpose digital computers or similar devices. 

The present invention also relates to apparatus for 
performing these operations. This apparatus may be 
specially constructed for the required purpose or it may 
comprise a general purpose computer as selectively 

30 activated or reconfigured by a computer program stored 
in the computer. The procedures presented herein are 
not inherently related to a particular computer or other 
apparatus. Various general purpose machines may be 
used with programs written in accordance with the 

35 teachings herein, or it may prove more convenient to 
construct more specialized apparatus to perform the 
required method steps. The required structure for a vari- 
ety of these machines will appear from the description 
given. '■ 

40 ■ 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

The environment in which the invention will operate 
is illustrated in Figures 1 A and 1 B. In the simplest envi- 

45 ronment, shown in Figure 1 A the user's computing 
device (110), running WWW browser software, is 
attached to a network (120). The specific WWW server 
(1 30) the user wants to access is attached to the same 
network. A more complex environment is depicted in 

so Figure 1 B in which the user's computing device (140) is 
attached to a network (1 50) that is attached to a Internet 
Service Provider (ISP) server (160) which is, in turn, 
connected to another network (170) providing a connec- 
tion to the specific WWW server (180). The various 

55 components of the invention may or may not be installed 
at each computing device or server, but the client-side 
component could be installed on the user's computing 
device (110, 140), the server-side component could be 
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installed on the specific WWW server being accessed 
(130, 180) and the collaborative component could be 
installed on the ISP server (160). 

Figures 2 through 9 provide flowcharts and dia- 
grams to demonstrate the preferred embodiment of the 
client-side component of the invention. 

Figure 2 provides a flowchart for the operation of 
the client-side portion of the invention and makes no 
assumptions regarding the deployment of the other 
components of the invention. The process begins when 
the browser sends a request for a particular WWW doc- 
ument or page (202). If the browser does not receive a 
"Server Not. Found" error (204) and it does not receive 
and "Document Not Found" error (244) then the docu- 
ment is displayed by the browser (240), the client-side 
component's databases are updated (238 and Figures 
8 and 9), and the process is terminated (236). 

If, however, the browser receives a "Server . Not 
Found" error (204) and the URL was not entered by the 
user (206, i.e., the user attempted to access the URL 
via a hypertext link), then the standard "Server Not 
Found" error message is displayed by the browser (234) 
and the process is terminated (236). Alternatively, if the 
URL was manually entered by the user (206) then the 
•.client-side component of the invention performs a spell- 
ing check on the protocol and domain-name portion of . 
the URL (208) and creates a list of potentially valid 
URLs (210). If the created list is not empty (212) then 
the list is displayed to the user in a hypertext format 
(214). The user may then. select one of the generated 
URLs (216) or cancel the operation (216). If the user 
chooses to cancel the operation (216) then the process 
is terminated (218). If. however, the user chooses one of 
the URLs displayed (216) then an attempt is made to 
retrieve the desired document (228). If it is successfully 
retrieved (226) then, following the flowchart connector 
(232) to its entry point (242), the document is displayed 
by the browser (240), the client-side component's data- 
bases are updated (238) and the process is terminated 
(236). If the document in not successfully retrieved then 
the type of error encountered is evaluated (250). If the 
error was "Server Not Found "then that message is dis- 
played using the prior art (224), the invalid URL is 
removed from the list (222) and, if the list is not empty 
(212) the process repeats from (214) until either the list 
is empty (212), in which case the process is terminated 
as described below, or a valid URL is selected (21 6) and 
used to retrieve the document with processing continu- 
ing at the C-C entry point (242). If the error was "Docu- 
ment Not Found" then, following the connector C-D 
(252) to its input at (248) where spell check operations 
on the URL components begins. 

If. however, the created list is empty (212) and has 
never had any URLs contained within it (220). then the 
"Server Not Found" error message is displayed using 
the prior art (234) and the process is terminated (236). 
Alternatively, if the list is now empty (212) but previously 
held URLs (220) then the user is provided with a mes- 



sage stating that the spelling check operation did not 
yield any valid URLs (250) and the process is termi- 
nated (230). 

A URL that does not return a "Server Not Found" 

5 error (204) but does return a "Document Not Found" 
error (244) follows the connector (246) to Figure 3's 
connector input (302). If the user did not manually enter 
the URL (304) then the error message "Document Not 
Found" is displayed using the prior art (306) and the 

w process is terminated (308). Otherwise, the user did 
manually enter the URL (304) and the components (i.e. 
those atomic words to the left of the domain name but 
ignoring Common Gateway Interface (CGI) arguments; 
for example in the URL "http://www.com- 

15 pany.com/foo/bar/doc.html?argumenr the components 
are defined to be foo. bar, and dpc.html) are then spell 
- checked (31 0) and a list of potential URLs is generated 
(312). If the list is not empty (314) then the list of URLs 
is displayed to the user in a hypertext format (316) 

20 where the user can either select one of the URLs or 
cancel (318). Choosing cancel (318) will terminate the 
process (320). Selecting a URL from the list (318) 
results in an attempt to retrieve the document using the 
selected URL (322). If the document is successfully 

25 retrieved (324) then processing branches from the con- 
nector C-C (326) to the input connector C-C (Figure 2. 
242) where the document is displayed (Figure. 2, 240). 
the client-side components databases are updated 
(Figure 2, 238) and the process is terminated (Figure 

30 2.236). 

If the selected URL is insufficient to retrieve a doc- 
ument (324) then, using the prior art, the "Document 
Not Found" error message is displayed (328), the 
invalid URL is removed from the list (330) and process- 

35 ing resumes at (314), continuing until a document is 
retrieved, the user cancels the operation or the list 
becomes empty. If the list becomes empty (314) and the 
list previously held constructed URLs (324)^then the 
user is given a message stating that none of the con- 

40 structed URLs were valid (336) and the process is ter- 
minated (332). 

Should the list of potentially valid URLs be empty 
(314, 334) immediately after the spell check (310) and 
list creation process (320) then the "Document Not 

45 Found " error message will be displayed using the prior 
art (306) and the process terminated (308). 

Figures 4-7 are a detailed view of the processes 
described in Figure 3 and. labeled "spell check URL 
components" (310) and "create list of potential URLs" 

sc (31 2). The spell checking process detailed in Figures 4- 
7, although specifically in the context of the client-side 
component of the invention, works with minor and obvi- 
ous modifications all the components where two text 
strings are compared. The generic algorithm is given 

55 below. Assume that one of String 1 and Stung 2 is the 
version typed in by the user. The othei will normally be 
a version stored in the database. 
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1) A potential spelling check match between two 
text strings, String 1 and String2, is detailed as fol- 
lows: 

Phase I: The process temporarily removes 
each of the characters in String 1 the temporar- 
ily modified Stringl is identical to String2 then a 
potential match exists. 

Phase II: The process temporarily removes 
each of the characters in String2 and if the tem- 
porarily modified String2 is identical to Stringl 
then a potential match exists. 

Phase III: Both Stringl and String2 are tempo- 
rarily converted into lower-case and all non- 
alphanumeric characters are removed. If 
Stringl and String2 are identical then a poten- 
tial match exists. 

Figure 4 is the process flowchart for the spelling 
checker operation of the client-side component of the 
invention. Upon invocation, an empty list is created 
(402) and the first URL component (as defined above) is 
parsed from the complete URL (404): An attempt is then 
made to retrieve a tuple {server name, component 
name} from Database C (see Figure 8. item 840) where 
the server name from the URL matches the server 
name in Database C (406). If the attempt is not suc- 
cessful (408) then Database C contains no matching 
entries to the URL's server name and the spell check 
process cannot proceed. The user is given a message 
(435) that the spell check process was unsuccessful 
(435) and the process is halted (436). 

Alternatively, if the attempt was successful (408) 
then the variables Stringl and String2, the URL compo- 
nent and the retrieved Database component respec- 
tively, are assigned to temporary variables and the 
match indicator is set to FALSE (410). Then Phase I 
(412) of the spelling check algorithm, as defined above, 
is invoked. If a match was found, the match indicator will 
have a value of TRUE. If the match indicator is still false 
after Phase I is completed/ Phase II (414) is invoked. 
Similarly, if Phase II does not result in a match, then 
Phase III (416) is invoked. If Phase III also does not 
yield a match then an attempt is made to retrieve the 
next tuple from Database C where the server name 
equals the URL server name (426). If the attempt is suc- 
cessful (434) then the temporary variables are reset in 
(41 0) and the process begins again at (41 2) . 

Should any Phase result in the match indicator 
being set to TRUE (418, 420 and 422) then the Data- 
base component that matched the URL component 
replaces the URL component in the URL and the 
revised URL is added to the list of potentially valid URLs 
(424). An attempt is then made to retrieve another tuple 
from the database (426) and if the attempt is successful 
(434) then the temporary variables are reset in (410) 



and the process begins again at (412). 

When the attempt to get the next tuple from Data- 
base C fails because the server name does not match 
that of the URL (434) (i.e. all tuples with server name 

5 equal to the URL server name have been examined) 
then, if there are more URL components to process 
(430), the next component is parsed from the URL (428) 
and the process begins again at (406). If there are no 
more URL components (430) then the process halts 

io (432), returning the list of potentially valid URLs to be 
tested in Figure 3, item 314. 

Figure 5 provides a detailed process flowchart of 
the Phase I spelling check for the client-side component 
of the invention. In Phase I, the value for Stringl is the 

is URL component and the value for String2 is the compo- 
nent found in the Database C. initializing a counter 
(505) allows the process to sequentially (510) and tem- 
porarily remove a single character at a time from Stringl 
and place the result in a temporary variable (520). If the 

20 temporary variable equals String2 (525) then the match 
indicator is set to TRUE (535) and the process is termi- 
nated (540). If the two strings are not equal (525) then 
the counter in incremented to point to the next character 
in Stringl (530) and the process begins again at (510). 

25 If the counter is greater than the length of Stringl then 
no match has been found and the process is terminated 
(515). 

Figure 6 provides a detailed process flowchart of 
the Phase II spelling check for the client-side compo- 

30 nent of the invention. In Phase II, the value for String2 is 
the URL component and the value for Stringl is the 
component found in the Database C. Initializing a coun- 
ter (605) allows the process to sequentially (610) and 
temporarily remove a single character at a time from 

35 Stringl and place the result in a temporary variable 
(620). If the temporary variable equals String2 (625) 
then the match indicator is set to TRUE (630) and the 
process is terminated (635). If the two strings are not 
equal (630) then the counter in incremented to point to 

40 the next character in Stringl (640) and the process 
begins again at (610). If the counter is greater than the 
length of Stringl then no match has been found and the 
process is terminated (615). 

Figure 7 provides a detailed process flowchart of 

45 the Phase III spelling check for the client-side compo- 
nent of the invention. In Phase III, the value for Stringl 
is the URL component and the value for String2 is the 
component found in the Database C. Each string varia- 
ble is copied into a temporary variable for processing 

so purposes (705), and the temporary variables are then 
" converted to lower case (710) and then all non-alphanu- 
meric characters are removed from both (715) If the 
temporary variables are equal (720) then the match 
indicator is set to TRUE (730) and the process is termi- 

55 nated (735). If the two strings are not equal (720) then 
the process is terminated (725). 

Figures 8A-8C illustrate the databases requned by 
• the client-side component of the invention The data- 
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base of Figure 8A (810) is a simple list of WWW proto- 
cols that would only be changed when the browser itself 
supported additional protocols. The database of Figure 
88 (820) is. also a simple list, but it is updated dynami- 
cally with the server names of all URLs that have been 5 
successfully accessed and viewed. The complete and 
correct URL (830) is used as input data for the database 
of Figure 8C (840) which contains tuples composed* of 
(server name, component name) as discussed above 
and which is also updated dynamically As shown in jq. 
Figure 8C, a plurality of {server name, component 
name} may be generated from a single URL . 

Figure 9 provides a process flowchart for the 
updating of the Databases B and C in the client-side 
component of the invention. A WWW document is is 
retrieved by the browser via a URL, either supplied by 
the user or embedded in another document as a hyper- 
text reference (905). The server name is parsed from 
the URL (910) and if the server name does not already 
reside in Database B (920) it is added to Database B 20 
(930). If the server name does currently exist in Data- 
base B (920) then the URL is parsed into components 
(excluding anything to the left of the server name and 
any GGI arguments) and placed.into an array (940). Ini- 
tializing the array subscript variable to zero (950) begins 2s 
the process of adding tuples of the form {server name, 
component name) into Database C. The loop begins by. 
incrementing the subscript variable by one (955). If the 
component in the array referenced by the subscript 
does exists (960, e.g., is not null) and the tuple does not 30. 
already exist in Database C (965), then the tuple is 
added to Database C (970) and the process begins 
again at (955) by incrementing the subscript variable. If 
the component does already exist in Database C (965) 
then it is skipped and the process begins again at (955) 3S 
by incrementing the subscript variable. Once the entire 
array of components has been exhausted (960) the 
process is terminated (975). 

In order to conceptualize the actual experience the 
user would have employing the client-side component 40 
of the invention, the following example is helpful. 
Assume that Database C in Figure 8 exists and that the 
user has entered the following URL into his browser: 

http://www.sun.com/foot/ba/file~html 45 

The protocol and domain server have been entered 
correctly, but a "Document Not Found" error occurs 
(Figure 2, item 244) and so process control branches to 
Figure 3. item 302. Since the user did manually enter so 
the URL the spell checking process begins (Figure 3, 
item 310) and control branches to Figure 4 item 402 
where an empty list of potential URLs is created. The 
first URL component is parsed from the URL and is 
"foot ". There does exists at least one tuple in Database ss 
C which has a server name equal to that of the URL's, 
"www.sun.com *. In this example the first tuple is 
{"www.sun.com", "foo" }. It is clear that a Phase I spell 



check will result in a' match and the resulting con- 
structed URL "http://www.sun.com/foo/ba/file--html" is 
place in the list of potentially valid URLs. Similarly, as 
the looping mechanism allow the processing of the sec- 
ond URL component, "ba". it is clear that a Phase II 
spell check will result in a match between "ba" 
and the tuple {"www.sun.com", "bar") which will 
lead. in turn, to the creation of the URL 
"http://www.sun.com/foo/bar/file~htmr* which is placed 
in the list. Finally, again utilizing the looping mecha- 
nisms, it is found that the next component "file-html" 
can be successfully spel! checked using Phase III, 
resulting in the creation of the URL 
"http://www.sun.com/foo/bar/file.html". The resulting list 
presented to the user would contain the following con- 
structed URLs: 

http://www.sun.com/foo/ba/file-html 
http://www.sun.com/foo/bar/file--html 
http://www.sun.com/foo/bar/file.html 

Although Figure 8 item 830 makes it clear by 
inspection that the third and final URL on the list is the 
correct one, in actual practice the user would only be 
assured that. the final URL has been the most thor- 
oughly spell checked. But since the databases-used to 
perform the spell check are. by their, methods of con- 
struction, immediately out of date, it is not possible to 
say that the completely spell checked URL. which did 
exist in the past, still exists now. Nor can it be said that 
the first or second URLs on the list are necessarily 
invalid. In fact, the only positive assertion that can be 
made is that the URL initially entered by the user is 
invalid because an error message was received by the 
server stating that the document could not be found. 

The method of spelling checking protocols and 
server names would operate in an analogous manner, 
utilizing the Phase I, II and III algorithms. While the 
actual parameters supplied would differ from the URL 
component spell check example, the transformations 
performed on String 1 and String2 remain the same. It 
would be obvious to one skilled in the art how the spell 
checking already described would apply to protocol and 
server name spell checking. 

Figures 10 and 11 provide process flowcharts to 
describe a server-side spell checking component of the 
invention. The server-side spell checking embodiment 
uses the same conceptual approach as the client-side 
embodiment. 

In Figure 10, the process begins when a request 
for. the WWW document or page is received by the 
server (1005). If the page is found on the server it is 
then sent to the requester using the prior art (1015). If it 
is not found on the server (1010) then the spell checking 
process starts by -initializing an array subscript variable 
to zero (1020) and then deconstructing the URL into an 
array of components .(1025). Since this is occurs at the 
server it may be safely assumed that the protocol and 
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domain server names were correct and can therefore 
be ignored, along with any CGI arguments. Next the 
current directory is set to be the root directory for hyper- 
text documents on the server (1030). After incrementing 
the array subscript (1035), the first component is com- 5 
pared to the entries in the directory (1040). If the com- 
ponent is found in the directory (1040) and the 
component is itself a directory name (1055) then the 
current directory is changed to be that of the component 
(1 060) and the process begins again at 1 035. 10 

However, if the component is not found in the cur- 
rent directory (1040) then the component is spell. • 
checked using the same generic algorithm described -. 
above. The String 1 argument would be the component 
itself, while the String2 values would be taken from the rs 
list of components found within the current directory one 
at a time. Following the connector S-A (1050) to it entry 
point in Figure 11 (1 1 05), if no matches were found dur- 
ing the spell check then, the list of potential URLs is 
empty and the user is sent a "Document Not Found" 20 
error using the prior art (1115) and the process is termk 
nated (1120). Alternatively, if matches were found, indi- 
cated by a non zero length list, the server-side , 
component would test each URL on the list and remove 
those that could not be accessed (1125, i.e., those that 25 
returned "Document Not Found" errors). If the system 
administrator at the server-side maintains a simple list 
of "hidden" files, files/that can only be accessed by 
centering the correct URL at the start, then the compo- 
nent will remove URLs from the spell checking list that 30 
occurinthe " hidden "files list (1130) . If after (1125 and 
1 1 30) the URL list is empty, the server-side component 
would send a "Document Not Found" error message to 
the user using the prior art (1115) and the process 
would terminate (1 1 20). If, however, the list is not empty 35 
then the server would construct a new page, in HTML 
format, that would contain a note to the user indicating 
that the requested URL wad not found but that instead 
the server had compiled a list of possible alternative 
URLs. The list of alternative URLs would be displayed in 40 
hypertext format with those components that had been 
replaced bythe server-side spell checking embodiment 
displayed in a bold typeface as a visual aid to the user in 
determining what part of his original URL had been . 
modified. Additionally, each suggested URL would be 45 
followed by the document title as a further selection aid 
for the user (1140). 

The server-side creation of new WWW pages is 
well known in the art. For example, many search 
engines on the WWW return HTML pages to the user so 
that contain hypertext links to other pages, with appro- 
priate bolding or highlight of keywords (from the search 
terms) that appear in the page's title. 

The collaborative (third) component of the invention 
performs sophisticated spell checking operations on a 55 
ISP's server or an organization's proxy server within a 
firewalled domain, For purposes of discussion these 
shall be referred to as "the service" The collaborative 



component is illustrated and described in Figures 12- 
15. 

Figure 12 provides a process flowchart for the use 
of the collaborative spell checker component. The serv- 
ice receives an response from the remote server due to 
a request to retrieve a document using a URL sent by 
the user (1205). If the response received is the error 
"Server Not Found" then the server name is spell 
checked. If the server portion of the URL does not 
reside in Database A, which is maintained at the serv- 
ice's site, then the spell check algorithm described pre- 
viously is employed to spell check the server name from 
the URL against server names in Database A (1220). 
That is, String 1 would be the URL server name and 
String2 would be take on successive values from Data- 
base A until a match is found or there are no more data- 
base entries. If candidate server names are found 
during the spell check (1225) then the service would 
construct a new HTML page of candidate hypertext 
URLs, composed of the candidate server names and 
the. components and CGI arguments to the left of the 
invalid server name in the original URL, which is then 
sent to the user (1240) and the process is terminated 
(1 245). As with the server-side component, the collabo- 
rative component will encode the HTML page so that 
the new candidate server names are boided or high- 
lighted to provide a visual aid to the user. Additionally, a 
message would appear above the list to inform the user 
that the original server name was invalid. However, if no 
candidate server names are found (1225) then the error 
message "Server Not Found" is returned to the user's 
browser using the prior art (1230) and the process is 
terminated (1235). 

Alternatively, if the service receives a "Document 
Not Found " error as a response from the remote server 
(1205), then the collaborative component will spell 
check the user-supplied URL This is accomplished by 
using the service's Database B which contains all valid 
URLs that have been retrieved from remote servers and 
passed back to users via the service for some specified 
period of time. The generic spell check algorithm 
describe above can again be employed. But in this 
embodiment the entire URL is checked as a single 
string. So/the variable Stringl would contain the user 
supplied URL and String2 would be take on successive 
URL values from Database B, where the server name of 
the user-supplied URL is equal to the server name por- 
tion of the URL in Database B, until a match is found or 
there are no more database entries (1 275). As with both 
the client-side and server-side components, a list a can- 
didate URLs is created based upon the results of the 
spell check. If the list is empty then the "Document Not 
Found" error is passed on to the user via the prior art 
(1284) and the process is terminated (1286). If the list is 
not empty then the service would construct a new 
HTML page of candidate hypertext URLs, which is then 
sent to the user (1282) and the process is terminated 
(1288). As with the server-side component, the collabo- 
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rative component will encode the HTML page so that 
the new candidate URLs are bolded or highlighted so 
provide a visual aid to the user. Additionally, a message 
would appear above the list to inform the user that the 
original URL was invalid. 

Of course, as the service successfully retrieves 
WWW pages, its databases (see Figure 14) will con- 
tinue to grow. In order to control the growth of the data- 
bases, a pruning algorithm has been devised for 
pruning the databases. It is depicted in Figure 13. The 
pruning occurs during low-load time for the service and 
is based upon defined aging parameters, which may be 
"hard coded" values that may not be changed by the 
pruning algorithm, or may be "seed values" that are 
dynamically changed during operations as conditions 
change. If dynamic rule modification is not being 
employed (1305) and if the spell checks are taking 
longer than the maximum time permitted (1310) the 
system operators are notified that the pre-defined time 
periods for holding database entries are no longer suit- 
able (1325) and the process is terminated (1 330). Alter- 
natively, if the time to perform a spell check is less than 
the maximum time permitted (1310) then the system 
uses the pre-defined time periods to determine which 
database entries should be removed. 

If "dynamic rule modification is employed (1305) 
then if the storage capacity threshold has been met or 
exceeded (1 335) then the aging values are reduced 
(1340) so that more older entries are removed. If stor- 
age capacity is still beneath the threshold (1 335) but the 
time to perform a spell check has exceeded the maxi- 
mum time allowed (1345) then, again, the aging values 
are reduced (1350). Alternatively, if the storage capacity 
is well below its threshold value and the time needed to 
perform spell checks is also significantly less than 
allowed (1355) then the aging values are increased 
(1 360). If the storage capacity is below its threshold and 
the time needed to perform spell checks is less than 
allowed (but not significantly) then the process is termi- 
nated (1365). The flowchart, beginning at (1335) indi- 
cates that only if the thresholds are exceeded or if the 
system is operating substantially under those thresh- 
olds are the aging values changed; it is possible that 
none of the conditions are met for changing the aging 
values and the process simply terminates (1 365). 

Figure 14 illustrates the databases required by the 
collaborative component of the invention. Database A 
contains previously valid server names and the dates 
they were last accessed (1405). Database B contains 
previously valid URLs for documents that have been 
successfully retrieved and the dates they were most 
recently retrieved (1420). 

Figure 15 provides a process flowchart for the 
updating of the collaborative component's databases. 
When a document is successfully retrieved from a 
remote server and sent to the user (1 505) the server 
name is parsed from the URL (1 51 0). If the server name 
exists in Database A (1515) then the date field in the 



database record corresponding to the server name is 
updated to the current date (1520). If the server name 
does not exist in Database A (1515) then the server 
name and the current date are added to the database 

5 (1525). If the URL is found in Database B (1530) then 
the date field in the database record corresponding to 
the URL is updated to the current date (1535) and the 
process is terminated (1545). If the URL does not exist 
in Database B (1 530) then the URL and current date are 

to added to the database (1 540) and the process is termi- 
nated (1545). 

Figure 16A illustrates a computer of a type suitable 
for carrying out the invention. Viewed externally in Fig- 
ure 16 A, a computer system has a central processing 

15 unit 1600 having disk drives 161 OA and 161 0B. Disk 
drive indications 161 OA and 161 0B are merely symbolic 
of a number of disk drives which might be accommo- 
dated by the computer system. Typically, these would 
include a floppy disk drive such as 161 OA. a hard disk 

20 drive (not shown externally) and a CD ROM drive indi- 
cated by slot 1 61 0B. The number and type of drives var- 
ies, typically, with different computer conf igurations. The 
computer has thedisplay 1620 upon which information 
is displayed. A keyboard 1630 and a mouse 1640 are 

25 - typically also available as input devices. Preferably, the 
computer illustrated in Figure 1 6A is a SPAR&worksta- 
tion from Sun Microsystems. Inc. 

Figure 1 6B illustrates a block diagram of the inter- 
nal hardware of the computer of Figure 16 A. A bus 

30 1650 serves as the main information highway intercon- 
necting the other components of the computer. CPU 
1655 is the central processing unit of the system, per- 
forming calculations and logic operations required to 
execute programs. Read only memory (1660) and ran- 

35 dom access memory (1665) constitute the main mem- 
ory of the computer. Disk controller 1670 interfaces one 
or more disk drives to the system bus 1650. These disk 
drives may be floppy disk drives, such as 1673, internal 
or external hard drives, such as 1672, or CD ROM or 

40 DVD (Digital Video Disks) drives such as 1671. A dis- 
play interface 1675 interfaces a display 1620 and per- 
mits information from the bus to be viewed on the 
display. Communications with external devices can 
occur over communications port 1685. 

45 Figure 16C illustrates an exemplary memory 
medium which can be used with drives such as 1 673 in 
Figure 16B or 161 OA in Figure 16A. Typically, memory 
media such as a floppy disk, or a CD ROM, or a Digital 
Video Disk will contain, the program information for con- 

so trolling the computer to enable the computer to perform 
its functions in accordance with the invention. 

In this disclosure, there is shown and described 
only the preferred embodiment of the invention, but, as 
aforementioned, it is to be understood that the invention 
55 is capable of use in various other combinations and 
environments and is capable of changes o» modifica- 
tions within the scope of the inventive concept as 
expressed herein. 
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Spell checking of network addresses such as Uni- 
form Resource Locator (URL) addresses is provided at 
three levels. Each is invoked when a connection to the 
specified network address is unable to be established. 
At a client level, the specified URL is compared with 
URL's previously successfully used to find candidate 
misspellings. At a server level, directory and file names 
are checked against corresponding components of the 
URL to which connection was requested to return a list 
of candidate correct spellings to the requestor. 
Excluded from the list returned to the requestor are the 
correct spellings of "hidden" files to which general 
access is not desired. At a network access: provider 
level, information about URL's successfully used by all 
customers is accumulated and used to provide a candi- 
date list of correct spellings to a user. Older entries are 
periodically pruned from the database to control size. 

Claims 

1. Apparatus for checking spelling of network 
addresses, comprising: 

a. a first database containing valid network pro- 
tocol names; 

b. a second database containing valid network 
server names; 

c. a third database containing valid component 
names; and 

d. a computer configured to analyze a network 
address, used in an attempt to establish a con- 
nection to that address but which did not result 
in a connection, to compare portions of that 
address with one of said first, second or third 
databases containing corresponding informa- 
tion and present to a user one or more alterna- 
tive spellings of that address if a portion of that 
address does not match identically a valid entry 
in the database. 

2. Apparatus of claim 1 in which said one or more 
alternative spellings are presented in a form so that 
a user my select one of said alternative spellings 
with an input device and attempt to connect again 
using the selected alternative spelling. 

3. Apparatus for checking spelling of network- 
addresses received at a server having a hierarchi- 
cal directory structure from a remote user, compris- 
ing: 

a. a database containing names of hidden files; 

b. and a computer configured to analyze net- 
work addresses term by term beyond the 
server address, to compare portions of an 
address with corresponding portions of the 
hierarchical directory and to present to said 
remote user one or more alternative spellings if 



a directory or file name does not match identi- 
cally a valid entry in the hierarchical directory, 
unless such an alternative spelling is contained 
in said database. 

5 

4. Apparatus of claim 3 in which said one or more 
alternative spellings are presented in a form so that 
said remote user can select one of said alternative 
spellings with an input device and attempt to con- 

ro nect again using the selected alternative spelling. 

5. Apparatus for checking spelling of network 
addresses received from a remote user at a net- 
work access provider, comprising: 

15 

a. a database containing remote server names 
to which users have successfully connected; 

b. a database containing network addresses; 
and 

20 c. a computer configured to analyze a network 

addresses, received from a remote user which 
did not result in a connection, to compare por- 
tions of that address with portions of each data- 
base containing corresponding information and 

25 ' present to a user one or more alternative spell- 

ings if a portion of a network address does not 
identically match a valid entry in the databases. 

6. Apparatus of claim 5 in which said one or more 
30 alternative spellings are presented in a form so that 

said remote user can select one of said alternative 
spellings with an input device and attempt to con- 
nect again using the selected alternative spelling. 

35 7. A system for checking spelling of network 
addresses received from a user, comprising at least 
any two of a client spell checker, a network access 
provider spell checker and a server spell checker 
resident on respective computers connected to a 

40 communications network. 

8. A system for checking spelling of network 
addresses received from a user, comprising: 

45 a. a network; and 

b. a computer connected to said network con- 
figured to spell check network addresses and 
to suggest alternative spellings. 

so 9. The system of claim 8 in which said computer is 
operated as a client in a client-server mode. 

10. The system of claim 8 in which said computer is 
operated as a server in a client-server mode 

55 

11. The system of claim 8 in which said computer is 
operated as a network access provider. 
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12. A method of checking spelling of network 
addresses, comprising the steps of: 

a. performing of a' step of comparing a portion 
of a network address received from a user 
which did not result in a connection with entries 
in a database containing corresponding por- 
tions of network addresses which had previ- 
ously resulted in connections; and 

b. performing of a step of identifying candidate j 
matches from the database which match 
imperfectly a portion of a network address, and 

c. performing of a step of when one or more 
candidate matches is found, providing a list of 
said candidate matches to said user. . i 

13. The method of claim 12, in which candidate 
matches are provided to a user in a hypertext for- 
mat which permits selection and use of one of said 
candidate matches in a connection request by click- 20 
ing on a candidate match. 

14. A method of checking spelling of network 
addresses in a server having a hierarchical direc- 

r - tory, comprising the steps of: 25 

a. performing a step of comparing a portion of 
a network address received from a remote user 
which did not result in access to a document on 
the server with corresponding portions of the 30 
hierarchical directory, and 

b. performing a step of presenting to said 
remote user alternative spellings if a directory 
or file name does not match identically a valid 
entry in the hierarchical directory. 35 

15. The method of claim 14 in which the names of hid- 
den files are excluded from the list of alternative 
spellings presented to a user. 

40 

16. A method of checking spelling of network 
addresses at a network access provider, compris- 
ing the steps of: 

a. providing an element for performing the step 45 
of storing remote server names and network 
addresses, to which network access provider 
users have successfully connected, in one or 
more databases; 

b. providing an element for performing the step so 
of comparing portions of an address received 
from a network access provider user which did 

not result in a connection, with corresponding 
portions said database; and 

c. providing an element for performing the step ss 
of presenting to said network access provider 
user alternative spellings if a portion of an 
address does not identically match a valid entry 



in the database. 

17. A computer program product, comprising: 

5 a. a memory medium; and 

b. a computer program stored on said memory 
medium, said computer program containing 
instructions for comparing a portion of a net- 
work address received from a user, which did 
not result in a connection, with entries in a 
database of network addresses, which previ- 
ously resulted in a connection, to identify candi- 
date matches which match imperfectly the 
network address received from the user, and 
when one or more candidate matches is found, 
providing a list of said candidate matches to 
said user. 

18. A computer program product for checking spelling 
of network addresses in a server having a hierarchi- 
cal directory, comprising: 

a. a memory medium; and 

b. a computer program stored on said memory 
medium, said computer program containing 
.instructions for comparing a portion- of a net- 
work address received from a user which did 
not result in access to a document on the 
server with corresponding portions of the hier- 
archical directory and to present to a user alter- 

. native spellings if a directory or file name does 
.not match identically a valid entry in the hierar- 
chical directory. 

19. A computer program product for checking spelling 
of network addresses in a server having a hierarchi- 
cal directoryxomprising: 

a. a memory medium; and 

b. a computer program stored on said memory 
medium, said computer program containing 
instructions for storing remote server names 

. and network addresses to which users have 
successfully connected in one or more data- 
bases, . comparing portions of an address 
received from a user which did not result in a 
connection with corresponding portions of said 
database; and presenting to a user alternative 
spellings if a portion of an address does not 
match identically a valid entry in the database. 
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